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The chemistry of insulin 

Nobel Lecture, December 11, 1958 

It is great pleasure and privilege for me to give an account of my work on 
protein structure and I am deeply sensitive of the great honour that has 
been done to me in recognizing my work in this way. Since the work on 
insulin has extended over about 12 years it will be necessary to give a some- 
what simplified account and to omit most of the work that did not contrib- 
ute directly to the main problem, the determination of the structure of a 
protein. 

In 1943 the basic principles of protein chemistry were firmly established. 
It was known that all proteins were built up from amino acid residues bound 
together by peptide bonds to form long polypeptide chains. Twenty dif- 
ferent amino acids are found in most mammalian proteins and by analytical 
procedures it was possible to say with reasonable accuracy how many res- 
idues of each one was present in a given protein. Practically nothing, how- 
ever, was known about the relative order in which these residues were 
arranged in the molecules. This order seemed to be of particular importance, 
since although all proteins contained approximately the same amino acids 
they differed markedly in both physical and biological properties. It was thus 
concluded that these differences were dependent on the different arrange- 
ment of the amino-acid residues in the molecules. Although very little was 
known about amino-acid sequence, there was much speculation in this field. 
The most widely discussed theory was that of Bergmann and Niemann who 
suggested that the amino acids were arranged in a periodic fashion, the res- 
idues of one type of amino acid occurring at regular intervals along the 
chain. On the other extreme there were those who suggested that a pure 
protein was not a chemical individual in the classical sense but consisted of a 
random mixture of similar individuals. 

Due largely to the work of Chibnall and his colleagues* insulin had been 
studied in considerable detail It had a somewhat simpler composition than 
most proteins, in that two of the commonly occurring amino acids, tryp- 
tophan and methionine were absent and an accurate analysis was avail- 
able. 
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Moreover, using the Van Slyke procedure, Chibnall had shown that insulin 
was peculiar in having a high content of free a-amino groups. This indicated 
that it was composed of relatively short polypeptide chains since free ce- 
amino groups would be found only on those residues (the N-terminal res- 
idues) which were present at one end of a chain. Thus the number of chains 
could be determined from the number of these N-terminal residues. The 
nature of one of these N-terminal residues was in fact known. Jensen & 
Evans 2 had shown that the phenylhydantoin of phenylalanine could be iso- 
lated from an acid hydrolysate of insulin that had been treated with phenyl- 
isocyanate, thus indicating that phenylalanine was at the end of one of the 
chains. At that time this was the only case where the position of an amino 
acid in a protein was known. 

There was considerable doubt about the actual molecular weight of insulin 
and hence the number of amino acid residues present Values varying from 
36,000 to 48,000 were reported by physical methods but it was shown by 
Gutfreund 3 that these high values were due to aggregation and it was sug- 
gested that the real molecular weight or subunit was 12,000. This indicated 
that there were about 100 residues in the molecule. More recentiy Harfenist 
& Crai^ have shown that the actual value is about 6,000; however during 
most of our work it was believed to be 12,000. 

In order to study in more detail the free amino groups of insulin and other 
proteins, a general method for labelling them was worked out 5 . This was 
the dinitrophenyl (or DNP) method. The reagent used was 1:2:4 fluoro- 
dinitrobenzene (FDNB) which reacts with the free amino groups of a protein 
or peptide to form a DNP derivative: 



R 

NH-CH-CO VV 



NO* NO, 
FDNB Protein DNP-protein 



The reaction takes place under mildly alkaline conditions which normally 
do not cause any breakage of the peptide bonds. 

The DNP-protein is then subjected to hydrolysis with acid which splits 
the peptide bonds in the chain, leaving the N-terminal residue in the form of 
its DNP-derivative. 
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The DNP-amino acids are bright yellow substances and can be separated 
from the unsubstituted amino acids by extraction with ether. They could be 
fractionated by partition chromatography, a method which had just been 
introduced by Gordon, Martin & Synge 6 at that time. The DNP-amino 
acids could then be identified by comparison of their chromatographic rates 
with those of synthetic DNP-derivatives. In the original work on insulin, 
silica-gel chromatography was used, though more recently other systems, 
particularly paper chromatography, have been found more satisfactory. 
Having separated and identified the DNP-derivatives they could be es- 
timated calorimetrically. 

When the method was applied to insulin, three yellow DNP-derivatives 
were found in the hydrolysate of the DNP-insulin. One of these was not 
extracted into ether and was e-DNP-lysine which was formed by reaction 
of the FDNB with the free e-amino group of lysine residues which are 
bound normally within the polypeptide chain. The others were identified as 
DNP-phenylalanine and DNP-glycine, and estimation showed that there 
were two residues of each assuming a molecular weight of 12,000. This sug- 
gested to us that insulin was composed of four polypeptide chains, two with 
phenylalanine and two with glycine end-groups. This method has now been 
applied widely to many proteins and peptides, and together with the Edman 
phenylisothiocyanate method is the standard method for studying N-ter- 
minal residues. In general it has been found that the chains of other proteins 
are much longer than those of insulin. All pure proteins appear to have only 
one or two N-terminal residues. 

It seemed probable that the chains of insulin were joined together by the 
disulphide bridges of cystine residues. Insulin is relatively rich in cystine and 
this was the only type of cross-linkage that was definitely known to occur 
in proteins. It was thus next attempted to separate the peptide chains by 
splitting the disulphide bridges. Earlier attempts to do this by reduction to 
-SH derivatives had not proved successful and had given rise to insoluble 
products 'which were probably due to some type of polymerization. More 
satisfactory results were obtained by oxidation with performic acid 7 . The 
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cvs tine residues were converted to cysteic acid residues thus breaking the 
cross-links: 

— NH • CH * CO— — NH ■ CH * CO— 

I 

CH, CH a 

S SO,H 
! H • COOOH 

y 

I 

CH, CH» 

— NH • CH • CO — — NH * CH ■ CO — 

cystine residue % cysteic acid residues 

(CyS S Cy) fCySOjH) 

Performic acid also reacts with methionine and tryptophan residues, the two 
amino acids which fortunately were absent from insulin. 

From the oxidized insulin two fractions could be separated by precipita- 
tion methods. One (fraction A) contained glycine and the other (fraction 
B) phenylalanine N- terminal residues. Fraction A was acidic and had a sim- 
pler composition than insulin, in that the six amino acids: lysine, arginine, 
histidine, phenylaline, threonine, and proline, were absent from it. It thus 
had no basic amino acids, which were found only in fraction B. From a 
quantitative determination of the end groups it was concluded that fraction 
A contained about 20 residues per chain, four of these being cysteic acid and 
fraction B had 30 residues, two of which were cysteic acid. Since the yield 
of each fraction was greater than 50% in terms of the N- terminal residues 
present and since they appeared to be homogeneous it seemed likely that 
there was only one type of glycyl chain and one type of phenylalanyl chain. 
This was confirmed by a study of the N- terminal sequences*. 

When the DNP derivative of fraction B was subjected to complete acid 
hydrolysis, DNP-phenylalanine was produced. If however it was subjected 
to a milder acid treatment so that only a fraction of the peptide bonds were 
split, DNP-phenylalanyl peptides were produced which contained the ami- 
no acid residues near to the N- terminal end and by an analysis of these pep- 
tides it was possible to determine the N-terminal sequence to four or five 
residues along the chain. The results with fraction B are shown in Table 1. 
It was concluded from these results that all the N-terminal phenylalanine 
residues of insulin were present in the sequence Phe • Val • Asp • Glu. This 
suggested that if there were in fact two phenylalanyl chains, then these two 
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were identical. Similar results were obtained with fraction A, and it was 
shown that the N-terminal glycine residues were present in the sequence 
Gly -lieu Val Glu Glu. 

Table I. 

„ Products of complete Products of Yield from 

Pe t" de hydrolysis of peptide jTf. StTUCtUK DNP* insulin 

hydrolysis 

Bi DNP-phenylalaninc — DNP-Phe 13 

B2 DNP-phenylalanine Bi DNP-Phe • Val 16 
Valine 

B3 DNP-phenylalanine Bi, B2 DNP-Phe - Val - Asp 13 
Valine 
Aspartic acid 

B4 DNP-phenylalaninc Bi,B2,B3 DNP-Phe • Val * Asp • Glu 30 
Valine 

Aspartic acid 
Glutamic acid 

Other bands giving B4 on partial hydrolysis 20 

Total 92 

* Moles peptide as per cent of total N-terminal phenylalanine residues of insulin. 



These results, besides giving information about the position of certain res- 
idues in the polypeptide chains, showed for the first time that the molecule 
was composed of only two types of chains and that if the molecular weight 
was 12,000 as was then believed, then the molecule was built up of two 
identical halves. The other alternative, which was later shown to be the case, 
was that the actual molecular weight was 6,000. In any case the structural 
problem was somewhat simplified since we were now concerned with deter- 
mining the sequence in two chains containing 20 and 30 residues respectively. 

The main technical problem was the fractionation of the extremely com- 
plex mixtures that resulted from partial hydrolysis of a protein. However 
Consden, Gordon, Martin & Synge 9 had shown that small peptides could be 
well fractionated by paper chromatography and had determined the se- 
quence in the pentapeptide "gramicidin-S" from the composition of peptides 
produced on acid hydrolysis. 

At this point (1949) I was joined by Dr. Hans Tuppy who came to work 
in Cambridge for a year. Although we did not seriously envisage the pos- 
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sibility of being able to determine the whole sequence of one of the chains 
within a year, it was considered worth while to investigate the small peptides 
from an acid hydrolysate using essentially the methods that had been applied 
to "gramicidin-S". Studies were initiated on both the chains at the same time 
but it soon became clear that there would be more difficulties with fraction 
A although it was the shorter chain and the work on fraction B progressed 
so favourably and Tuppy worked so hard that by the end of the year we 
were virtually able to deduce the whole of the sequence of its 30 residues 10 . 

Fraction B was subjected to partial hydrolysis with acid. Since the mixture 
was too complex for direct analysis by paper chromatography it was nec- 
essary to carry out certain preliminary group separations in order to obtain 
fractions containing 5-20 peptides that could then be separated on paper. 
This was accomplished by ionophoresis, ion-exchange chromatography, and 
adsorption on charcoal. These simplified mixtures were then fractionated by 
two-dimensional paper chromatography. The peptide spots were cut out 
and the material eluted from the paper, subjected to complete hydrolysis and 
analysed for its constituent amino acids. Another sample of the peptide was 
then investigated by the DNP technique to determine the N- terminal res- 
idue. Table 2 illustrates the results obtained with a very acidic fraction ob- 
tained by ion-exchange chromatography. This contained only peptides of 

Table 2. Cysteic acid peptides identified in a partial acid hydrolysate of fraction B. 

(The inclusion of residues in brackets indicates that their relative order is not known.) 

CyS0 3 H-Gly CySO^HGly 

Val-CyS0 3 H 

Leu - CyS0 3 H 

Val-(CyS0 3 H t Gly) 

Leu • (CySC^H, Gly) 

Leu • (Val, CySO^H) 

Leu * ( Val, CyS0 3 H, Gly) 
Sequences deduced Leu - Val • CyS0 3 H • Gly Leu • CyS0 3 H • Gly 



cysteic acid. Since there are only two such residues in fraction B all these 
peptides must fit into two sequences. The way in which the two sequences 
Leu CyS0 3 H Gly and Leu Vale CyS0 3 H Gly were deduced from the 
results obtained with the peptides is illustrated in the table. 
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In this way about 45 peptides were identified in various fractions of the 
partial acid hydrolysate and the following five sequences were deduced as 
being present in the phenylalanine chain. 

1. Phe Val Asp Glu His Leu CySO^H Gly (N- terminal sequence). 

2. Gly Glu Arg Gly. 

3. Thr • Pro • Lys * Ala. 

4. Tyr Leu Val CyS0 3 H Gly. 

5. Ser His Leu Val Glu Ala. 

These five sequences contain all but four of the amino acid residues of frac- 
tion B. It was not possible to determine from the small peptides derived 
from acid hydrolysates the position of the remaining four residues or how 
the above five sequences were joined together. There were two reasons for 
this. Firstly there was considerable technical difficulty in fractionating the 
peptides containing two or more of the non-polar residues such as tyrosine 
or leucine. It happened that these residues were grouped together in the chain 
(see below) and gave rise to a mixture of peptides that moved fast on paper 
chromatograms and were not well resolved. The second difficulty was due 
to the great lability to acid of the bonds involving the amino groups of the 
serine and threonine residue. It was never possible to find a peptide contain- 
ing this bond and hence to know what residue preceded the serine and 
threonine. 

It was thus necessary to use another method of hydrolysis that would 
show a different specificity from concentrated acid. Hydrolysates prepared 
by the action of dilute acid at high temperatures or of alkali were studied but 
yielded little further information. Much more successful however was the 
use of proteolytic enzymes 1 \ Initially we had refrained from using them 
since it was considered that they might bring about re-arrangement of the 
peptide bonds by transpeptidation or actual reversal of hydrolysis. Sub- 
sequent work has however shown that this is not a very serious danger and 
in fact proteolytic enzymes are the most useful hydrolytic agent for studies 
of amino acid sequences. 

Proteolytic enzymes are much more specific than is acid since only a few 
of the peptide bonds are susceptible. They give rise to larger peptides which 
in general are more difficult to fractionate by paper chromatography. How- 
ever there are relatively few of them so that the mixtures are less complex. 
In this initial work we used essentially the same methods for studying the 
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enzymic peptides as we had used for the acid ones, depending largely on 
paper chromatography for fractionation, although more recently it has been 
shown that better separations can be obtained by ion-exchange chromatog- 
raphy and by ionophoresis. 

As an example we may consider a peptide Bp3 obtained by the action of 
pepsin. It had the following composition Phe (CySO,H, Asp, Glu, Ser, Gly, 
Val, Leu, His) of which the most important components are aspartic acid 
and serine since they occur only once in the chain. Aspartic acid is present 
only in the N-terminal sequence 1 and serine is in sequence 5. This shows 
that all of sequence land at least the N-terminal part of sequence 5 is present 
in peptide Bp3. That none of the other sequences are present follows from 
the fact that Bp3 contains no arginine (sequence 2), threonine, proline, or 
lysine (sequence 3) or tyrosine (sequence 4). One may thus conclude that 
the two sequences are joined together. By studying other peptides obtained 
by the action of pepsin, trypsin and chy mo trypsin it was possible to find out 
how the various sequences were arranged and to deduce the complete se- 
quence of the phenylalanyl chain which is shown below: 

Phe Val Asp Glu His Leu CyS0 3 H Gly Ser His Leu Val Glu 
Ala Leu Tyr Leu Val • CySO^H Gly Glu Arg Gly Phe Phe 
• Tyr ■ Thr ■ Pro • Lys • Ala. 

In this work many more peptides were studied from both acid and enzymic 
hydroly sates than were actually necessary to deduce the sequence. This was 
considered essential since the methods used were new and were qualitative 
rather than quantitative. The fact that all the peptides fitted into the unique 
sequence given above added further proof to its validity. 

Essentially similar methods were used to determine the sequence of frac- 
tion A 12 . Although the shorter of the two chains, the determination of its 
structure was more difficult. Fraction B contains several residues that occur 
only once in the molecule and this helps considerably in interpreting the 
results, whereas fraction A has only a few such residues and these are all near 
one end. Also fraction A is much less susceptible to enzymic hydrolysis. It is 
not attacked by trypsin and there is a sequence of thirteen residues which is 
not split by chymotrypsin or pepsin either. Considerable difficulty was at 
first experienced with the cysteic acid peptides. Fraction A contains the se- 
quence CyS0 3 H • Cy SO3H and this gave rise to a number of very water- 
soluble peptides which would not fractionate easily by paper chromatog- 
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raphy. However it was found that by paper ionophoresis at pH 2.75 they 
could be well separated since they were the only acidic peptides present At 
this pH, - COOH groups are uncharged, — S0 3 H groups carry a negative 
and -NH 2 groups a positive charge. Peptides without cysteic acid were all 
positively charged, those with one cysteic acid were neutral and could be 
separated as a group and those with two cysteic acids were negatively charged. 
If a slightly higher pH (3.5) is used for the ionophoresis, the -COOH 
groups become slightly charged and all the peptides containing one cysteic 
acid residue move slowly towards the anode and can be fractionated in this 
way. This method was found very useful for the separation and identification 
of cysteic acid peptides. Fig. 1 is a tracing of an ionogram of an acid hydro- 
lysate of fraction A carried out in this way. 
The sequence of fraction A was found to be: 

Gly lieu Val Glu Glu CyS0 3 H CyS0 3 H Ala Ser Val CyS0 3 H 
Ser Leu Tyr Glu Leu Glu Asp Tyr • CyS0 3 H Asp. 

When a protein is hydrolysed with strong acid, it gives rise not only to 
amino acids but also to a certain amount of ammonia. This is present in the 
form of amide groups on some of the aspartic and glutamic acid residues. It 
was thus necessary to determine the position of these groups 13 . This was done 
by studying the ionophoretic rates and amide contents of peptides derived 
from enzymic hydrolysates, since the amide groups are not split off by en- 
zymes, whereas they are by acid. The position of the amide groups are 
indicated in Fig. 2 by the symbols NH 2 . 

Having determined the structure of the two chains of insulin the only 
remaining problem was to find how the disulphide bridges were arranged. 
About this time it was shown by Harfenist & Craig that the molecular weight 
of insulin was of the order of 6,000, so that it consisted of two chains con- 
taining three disulphide bridges, and not of four chains as we had originally 
thought The fact that fraction A contained four cysteic acid residues whereas 
fraction B had only two indicated that two bridges must connect the two 
chains together and one must form an intrachain bridge connecting one part 
of the A chain with another part of the same chain. 

In order to determine the distribution of the disulphide bridges, it was 
necessary to isolate from unoxidized insulin peptides containing intact cys- 
tine residues. These could then be oxidized to give cysteic acid peptides 
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Vol. CySO^H 
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Cysteic octd 
CyS0 3 K CyS0 3 H 



Fig. 1. Ionophoresis of partial acid hydrolysate of fraction A at pH 3.5 showing separa- 
tion of cysteic acid peptides. 

which could be recognized since they had been found in the hydrolysates of 
the oxidized chains. However an unexpected difficulty arose, in that during 
hydrolysis a reaction occurred which caused a random rearrangement of the 
disulphide bonds, so that cystine peptides were isolated which were not actual 
fragments of the original insulin and it would have appeared from the results 
that every half-cystine was combined to every other half-cystine residue. 

This disulphide interchange reaction could be demonstrated and studied 
using as a model system a mixture of cystine and bis-DNP cystine, which 
reacted together to give mono-DNP-cystine 



DNP — Cy — S Cy— S 

i + I 

DNP — Cy— S Cy— S 



DNP— Cy — S 

I 

Cy— S 
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An ether-soluble coloured substance was thus converted to a water-soluble 
coloured substance and the course of the reaction could be studied by meas- 
uring the distribution of colour between ether and water. 

It was found that there were two types of disulphide interchange reac- 
tions 14 . One took place in neutral and alkaline solution and was catalyzed 
by -SH compounds. It is probably due to initial hydrolysis of the disul- 
phide which then catalyzes a chain reaction: 

OH" 

R X SSR 3 >■ RjS 4- R,SOH 

R,S~ + R,$SR 4 > R,SSR 3 + R 4 S~ etc 

In neutral conditions the reaction could be inhibited by -SH inhibitors so 
that it was possible to use enzymic hydrolysis to obtain cystine peptides 15 . 
Thus for instance with chy mo trypsin a peptide was obtained which on 
oxidation gave the two cysteic acid peptides CyS0 3 H AspNH, and Leu * 
Val * CyS0 3 H * Gly • Glu • Arg • Gly • Phe * Phe. The structure of the 
cystine peptide was thus: 

Cy * AspNH 2 
t 

s 

Leu- Val-Cy-Gly-Glu- Arg-Gly Phe- Phe 

establishing the presence of a disulphide bridge between the two half-cystine 
residues nearest the C- terminal ends of the two chains. 

It was not, however, possible to determine the positions of the other two 
disulphide bonds using enzymic hydrolysis, since no enzyme would split be- 
tween the two consecutive half-cystine residues of the A chain. It was there- 
fore necessary to re-investigate the possibility of using acid hydrolysis. 

The disulphide interchange reaction that occurred in acid solution was 
found to be different from that occurring in neutral and alkaline solution and 
instead of being catalyzed by -SH compounds, was actually inhibited by 
them. Not only did this show that a different reaction was involved but it 
also made it possible to prevent it occurring during acid hydrolysis. Thus 
when insulin was treated with concentrated acid to which a small amount 
of thioglycolic acid was added cystine peptides could be isolated which were 
in fact true breakdown products and from which the distribution of the 
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remaining two disulphide bonds could be deduced. These are shown in 
Fig. 2, which shows the complete structure of insulin 



Pr*j^l-A3PH3u>Hi*H.eo-Cy-GV-Ser^i5-^ 

( 

NHa S NH 2 NH 2 S 

Giy-iteu^Glu-GUi-Cy-Cy-Alo-Ser^^Cy-Ser<eu-Tyr^^^^3lu- Asa*yr-Cy- aso 
t S S 1 

Fig. 2. The structure of insulin. 

Of the various theories concerned with protein chemistry our results sup- 
ported only the classical peptide hypothesis of Hofmeister and Fischer. The 
fact that all our results could be explained on this theory added further proof, 
if any were necessary, to its validity. They also showed that proteins are def- 
inite chemical substances possessing a unique structure in which each position 
in the chain is occupied by one and only one amino acid residue. 

Examination of the sequences of the two chains reveals no evidence of 
periodicity of any kind nor does there seem to be any basic principle which 
determines the arrangement of the residues. They seem to be put together in 
a random order, but nevertheless a unique and most significant order, since 
on it must depend the important physiological action of the hormone. 

As yet little is known about the relationship of the physiological action of 
insulin to its chemical structure. One approach to this problem was to study 
the insulins from different animal species 1 " 7 . Since all insulins show the 
same activity it could be concluded that differences would be found only in 
parts of the molecule that were not important for activity. 

All the above results were obtained on cattle insulin. When insulins from 
four other species were studied by essentially the same methods it was found 
that the whole of the B chain was identical in all species and the only differ- 
ences were found in the three amino acids contained within the disulphide 
ring of the A chain, which in the cattle are Ala ■ Ser * Val and in the other 
species are as follows: 

Pig - Thr • Ser • Ueu 
Sheep - Ala * Giy • Val 
Horse - Thr * Gly • lieu 
Whale - Thr -Ser -lieu 
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These results suggest that the exact structure of the residues in this position 
is not important for biological activity, but it does not follow that the whole 
of the rest of the molecule is important 

The determination of the structure of insulin clearly opens up the way to 
similar studies on other proteins and already such studies are going on in a 
number of laboratories. These studies are aimed at determining the exact 
chemical structure of the many proteins that go to make up living matter 
and hence at understanding how these proteins perform their specific func- 
tions on which the processes of Life depend. One may also hope that studies 
on proteins may reveal changes that take place in disease and that our efforts 
may be of more practical use to humanity. 
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